Learning with Dynamic Programming

نویسنده

  • Peter I. Frazier
چکیده

We consider the role of dynamic programming in sequential learning problems. These problems require deciding which information to collect in order to best support later actions. Such problems are ubiquitous, appearing in simulation, global optimization, revenue management, and many other areas. Dynamic programming offers a coherent framework for understanding and solving Bayesian formulations of these problems. We present the dynamic programming formulation applied to a canonical problem, Bernoulli ranking and selection. We then review other sequential learning problems from the literature and the role of dynamic programming in their analysis. Frequently in operations research and management science we face uncertainty, whether in the form of uncertain demand for goods, uncertain travel times in networks, or uncertainty about which set of parameters best calibrates a simulation model. Often when facing uncertainty we are also offered the opportunity to collect some information that will mitigate uncertainty’s effects. On one hand, collecting information allows us to make better decisions and produce better outcomes. On the other hand, collecting information carries a cost, whether in time or money spent, or in the opportunity cost sacrificed collecting one piece of information when we could have collected another. How can we optimally balance these costs and benefits? One way to formulate such problems is with Bayesian methods (see, e.g., Berger (1985)), in which we begin with a prior probability distribution that describes our subjective belief about how likely each of the many different possible truths are. Dynamic programming (DP) (Bellman, 1954) offers a general way to formulate and solve such Bayesian sequential learning problems. This DP formulation provides value in three ways. First, in some cases, the DP formulation allows computing the optimal solution. This is usually only true for smaller problems (e.g., the inventory problem considered in Lariviere and Porteus (1999)), because the curse of dimensionality (see, e.g., Powell (2007)) prevents computing the solution to larger problems in a practically feasible amount of time. Second, the DP formulation sometimes allows structural results to be shown theoretically, which either provides intuition about the problem and the behavior of the optimal policy as in, for example, Ding et al. (2002), or provides a characterization of the optimal policy that may be directly exploited to find an optimal policy, as in sequential hypothesis testing (Wald and Wolfowitz, 1948). Third and finally, the DP formulation sometimes suggests useful heuristics (e.g., knowledge-gradient methods (Frazier, 2009)) for complex and large-scale learning problems. The sequential learning problems that we consider here have been studied in several separate communities. Within the operations research community, sequential learning problems, as well as the way in which DP can be used to confront them, were recognized in early work by Howard (Howard, 1960, 1966). Within the resulting literature, surveyed in Monahan (1982); Lovejoy (1991), such sequential learning problems were called Partially Observable Markov Decision Processes (POMDP). This work on POMDPs was continued in the artificial intelligence and reinforcement learning communities, (see, e.g., Cassandra et al. (1995); Kaelbling et al. (1998)), where the emphasis is on general purpose complexity results and algorithmic strategies that apply to the whole spectrum of POMDPs. Specific applications tend to come from robotics, game playing, and other problems that are thought to be exemplars for the problems that humans face when interacting in a general way with their environment. More recently, these problems have also been called Bayes

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic Dynamic Programming with Markov Chains for Optimal Sustainable Control of the Forest Sector with Continuous Cover Forestry

We present a stochastic dynamic programming approach with Markov chains for optimal control of the forest sector. The forest is managed via continuous cover forestry and the complete system is sustainable. Forest industry production, logistic solutions and harvest levels are optimized based on the sequentially revealed states of the markets. Adaptive full system optimization is necessary for co...

متن کامل

An Optimal Tax Relief Policy with Aligning Markov Chain and Dynamic Programming Approach

Abstract In this paper, Markov chain and dynamic programming were used to represent a suitable pattern for tax relief and tax evasion decrease based on tax earnings in Iran from 2005 to 2009. Results, by applying this model, showed that tax evasion were 6714 billion Rials**. With 4% relief to tax payers and by calculating present value of the received tax, it was reduced to 3108 billion Rials. ...

متن کامل

Bankruptcy Prediction: Dynamic Geometric Genetic Programming (DGGP) Approach

 In this paper, a new Dynamic Geometric Genetic Programming (DGGP) technique is applied to empirical analysis of financial ratios and bankruptcy prediction. Financial ratios are indeed desirable for prediction of corporate bankruptcy and identification of firms’ impending failure for investors, creditors, borrowing firms, and governments. By the time, several methods have been attempted in...

متن کامل

A dynamic programming approach for solving nonlinear knapsack problems

Nonlinear Knapsack Problems (NKP) are the alternative formulation for the multiple-choice knapsack problems. A powerful approach for solving NKP is dynamic programming which may obtain the global op-timal solution even in the case of discrete solution space for these problems. Despite the power of this solu-tion approach, it computationally performs very slowly when the solution space of the pr...

متن کامل

A dynamic bi-objective model for after disaster blood supply chain network design; a robust possibilistic programming approach

Health service management plays a crucial role in human life. Blood related operations are considered as one of the important components of the health services. This paper presents a bi-objective mixed integer linear programming model for dynamic location-allocation of blood facilities that integrates strategic and tactical decisions. Due to the epistemic uncertain nature of ...

متن کامل

Measuring a Dynamic Efficiency Based on MONLP Model under DEA Control

Data envelopment analysis (DEA) is a common technique in measuring the relative efficiency of a set of decision making units (DMUs) with multiple inputs and multiple outputs. ‎‎Standard DEA models are ‎‎quite limited models‎, ‎in the sense that they do not consider a DMU ‎‎at different times‎. ‎To resolve this problem‎, ‎DEA models with dynamic ‎‎structures have been proposed‎.‎In a recent pape...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010